FFmpeg学习：音频重采样

您所在的位置：网站首页 › ffmpeg resample视频 › FFmpeg学习：音频重采样

FFmpeg学习：音频重采样

2023-12-18 14:28| 来源: 网络整理| 查看: 265

重采样：

将音频进行SDL播放的时候，因为当前的SDL2.0不支持plannar格式，也不支持浮点型的，而最新的FFpemg会将音频解码为AV_SAMPLE_FMT_FLTP，这个时候进行对它重采样的话，就可以在SDL2.0上进行播放这个音频了。

重采样参数 1、sample rate(采样率)：采样设备每秒抽取样本的次数 2、sample format(采样格式)和量化精度：每种⾳频格式有不同的量化精度（位宽），位数越多，表示值就越精确，声⾳表现⾃然就越精准。 3、channel layout（通道布局，也就是声道数）：这个就是采样的声道数

在FFmpeg里面主要有两种采样格式:floating-point formats 和 planar sample formats；具体采样参数如下(在（libavutil/samplefmt.h头文件里面)：

enum AVSampleFormat { AV_SAMPLE_FMT_NONE = -1, AV_SAMPLE_FMT_U8, ///< unsigned 8 bits AV_SAMPLE_FMT_S16, ///< signed 16 bits AV_SAMPLE_FMT_S32, ///< signed 32 bits AV_SAMPLE_FMT_FLT, ///< float AV_SAMPLE_FMT_DBL, ///< double AV_SAMPLE_FMT_U8P, ///< unsigned 8 bits, planar AV_SAMPLE_FMT_S16P, ///< signed 16 bits, planar AV_SAMPLE_FMT_S32P, ///< signed 32 bits, planar AV_SAMPLE_FMT_FLTP, ///< float, planar AV_SAMPLE_FMT_DBLP, ///< double, planar AV_SAMPLE_FMT_S64, ///< signed 64 bits AV_SAMPLE_FMT_S64P, ///< signed 64 bits, planar AV_SAMPLE_FMT_NB ///< Number of sample formats. DO NOT USE if linking dynamically };

说明：

1.U8（无符号整型8bit）、S16（整型16bit）、S32（整型32bit）、FLT（单精度浮点类型）、DBL（双精度浮点类型）、S64（整型64bit），不以P为结尾的都是interleaved（packed）结构，以P为结尾的是planar结构。 2.Planar模式是FFmpeg内部存储模式，我们实际使用的音频文件都是Packed模式的。 3.FFmpeg解码不同格式的音频输出的音频采样格式不是一样。测试发现，其中AAC解码输出的数据为浮点型的 AV_SAMPLE_FMT_FLTP 格式，MP3解码输出的数据为 AV_SAMPLE_FMT_S16P 格式（使用的mp3文件为16位深）。具体采样格式可以查看解码后的AVFrame中的 format 成员或解码器的AVCodecContext中的 sample_fmt 成员。

还有就是声道分布参数，这个在FFmpeg也有说明(声道分布在FFmpeg\libavutil\channel_layout.h中有定义，⼀般来说⽤的⽐较多的是 AV_CH_LAYOUT_STEREO（双声道）和AV_CH_LAYOUT_SURROUND（三声道），这两者的定义如下)：

#define AV_CH_LAYOUT_STEREO (AV_CH_FRONT_LEFT|AV_CH_FRONT_RIGHT) #define AV_CH_LAYOUT_SURROUND (AV_CH_LAYOUT_STEREO|AV_CH_FRONT_CENTER) FFmpeg重采样流程

重采样的处理流程： 1、创建上下文环境：重采样过程上下文环境为SwrContext数据结构。 2、参数设置：转换的参数设置到SwrContext中。SwrContext初始化：swr_init()。 3、分配样本数据内存空间：使用av_samples_alloc_array_and_samples、av_samples_alloc等工具函数。 4、开启重采样转换：通过重复地调用swr_convert来完成。 5、重采样转换完成，释放相关资源：通过swr_free()释放SwrContext。

流程图如下：

FFmpeg重采样流程（方法二）（采用swr_convert_frame()进行重采样）

重采样的处理流程： 1、创建上下文环境：重采样过程上下文环境为SwrContext数据结构。（和上面一样） 2、参数设置：转换的参数设置到SwrContext中。SwrContext初始化：swr_init()。 3、直接采用swr_convert_frame()对输入帧和输出帧进行重采样。（不需要提前申请空间，详细看函数介绍） 4、重采样转换完成，释放相关资源：通过swr_free()释放SwrContext。

FFmpeg采样流程相关api介绍【第一】音频重采样上下文——结构体SwrContext

【结构声明】

点击查看代码 struct SwrContext { const AVClass *av_class; ///< AVClass used for AVOption and av_log() int log_level_offset; ///< logging level offset void *log_ctx; ///< parent logging context enum AVSampleFormat in_sample_fmt; ///< input sample format enum AVSampleFormat int_sample_fmt; ///< internal sample format (AV_SAMPLE_FMT_FLTP or AV_SAMPLE_FMT_S16P) enum AVSampleFormat out_sample_fmt; ///< output sample format int64_t in_ch_layout; ///< input channel layout int64_t out_ch_layout; ///< output channel layout int in_sample_rate; ///< input sample rate int out_sample_rate; ///< output sample rate int flags; ///< miscellaneous flags such as SWR_FLAG_RESAMPLE float slev; ///< surround mixing level float clev; ///< center mixing level float lfe_mix_level; ///< LFE mixing level float rematrix_volume; ///< rematrixing volume coefficient float rematrix_maxval; ///< maximum value for rematrixing output int matrix_encoding; /**< matrixed stereo encoding */ const int *channel_map; ///< channel index (or -1 if muted channel) map int used_ch_count; ///< number of used input channels (mapped channel count if channel_map, otherwise in.ch_count) int engine; int user_in_ch_count; ///< User set input channel count int user_out_ch_count; ///< User set output channel count int user_used_ch_count; ///< User set used channel count int64_t user_in_ch_layout; ///< User set input channel layout int64_t user_out_ch_layout; ///< User set output channel layout enum AVSampleFormat user_int_sample_fmt; ///< User set internal sample format int user_dither_method; ///< User set dither method struct DitherContext dither; int filter_size; /**< length of each FIR filter in the resampling filterbank relative to the cutoff frequency */ int phase_shift; /**< log2 of the number of entries in the resampling polyphase filterbank */ int linear_interp; /**< if 1 then the resampling FIR filter will be linearly interpolated */ int exact_rational; /**< if 1 then enable non power of 2 phase_count */ double cutoff; /**< resampling cutoff frequency (swr: 6dB point; soxr: 0dB point). 1.0 corresponds to half the output sample rate */ int filter_type; /**< swr resampling filter type */ double kaiser_beta; /**< swr beta value for Kaiser window (only applicable if filter_type == AV_FILTER_TYPE_KAISER) */ double precision; /**< soxr resampling precision (in bits) */ int cheby; /**< soxr: if 1 then passband rolloff will be none (Chebyshev) & irrational ratio approximation precision will be higher */ float min_compensation; ///< swr minimum below which no compensation will happen float min_hard_compensation; ///< swr minimum below which no silence inject / sample drop will happen float soft_compensation_duration; ///< swr duration over which soft compensation is applied float max_soft_compensation; ///< swr maximum soft compensation in seconds over soft_compensation_duration float async; ///< swr simple 1 parameter async, similar to ffmpegs -async int64_t firstpts_in_samples; ///< swr first pts in samples int resample_first; ///< 1 if resampling must come first, 0 if rematrixing int rematrix; ///< flag to indicate if rematrixing is needed (basically if input and output layouts mismatch) int rematrix_custom; ///< flag to indicate that a custom matrix has been defined AudioData in; ///< input audio data AudioData postin; ///< post-input audio data: used for rematrix/resample AudioData midbuf; ///< intermediate audio data (postin/preout) AudioData preout; ///< pre-output audio data: used for rematrix/resample AudioData out; ///< converted output audio data AudioData in_buffer; ///< cached audio data (convert and resample purpose) AudioData silence; ///< temporary with silence AudioData drop_temp; ///< temporary used to discard output int in_buffer_index; ///< cached buffer position int in_buffer_count; ///< cached buffer length int resample_in_constraint; ///< 1 if the input end was reach before the output end, 0 otherwise int flushed; ///< 1 if data is to be flushed and no further input is expected int64_t outpts; ///< output PTS int64_t firstpts; ///< first PTS int drop_output; ///< number of output samples to drop double delayed_samples_fixup; ///< soxr 0.1.1: needed to fixup delayed_samples after flush has been called. struct AudioConvert *in_convert; ///< input conversion context struct AudioConvert *out_convert; ///< output conversion context struct AudioConvert *full_convert; ///< full conversion context (single conversion for input and output) struct ResampleContext *resample; ///< resampling context struct Resampler const *resampler; ///< resampler virtual function table double matrix[SWR_CH_MAX][SWR_CH_MAX]; ///< floating point rematrixing coefficients float matrix_flt[SWR_CH_MAX][SWR_CH_MAX]; ///< single precision floating point rematrixing coefficients uint8_t *native_matrix; uint8_t *native_one; uint8_t *native_simd_one; uint8_t *native_simd_matrix; int32_t matrix32[SWR_CH_MAX][SWR_CH_MAX]; ///< 17.15 fixed point rematrixing coefficients uint8_t matrix_ch[SWR_CH_MAX][SWR_CH_MAX+1]; ///< Lists of input channels per output channel that have non zero rematrixing coefficients mix_1_1_func_type *mix_1_1_f; mix_1_1_func_type *mix_1_1_simd; mix_2_1_func_type *mix_2_1_f; mix_2_1_func_type *mix_2_1_simd; mix_any_func_type *mix_any_f; /* TODO: callbacks for ASM optimizations */ }; 比较重要的参数： + 采样格式 + 声道类型

【结构体定义和申请空间】

SwrContext* audio_convert_ctx = NULL; audio_convert_ctx = swr_alloc(); 【第二】上下文结构体参数设置

参数设置有两种方式：【方法一】：采用AVOptions的api，举例如下

av_opt_set_channel_layout(audio_convert_ctx, "in_channel_layout", AV_CH_LAYOUT_5POINT1, 0); av_opt_set_channel_layou(audio_convert_ctx, "out_channel_layout", AV_CH_LAYOUT_STEREO, 0); av_opt_set_int(audio_convert_ctx, "in_sample_rate", 44100, 0); av_opt_set_int(audio_convert_ctx, "out_sample_rate", 16000, 0); av_opt_set_sample_fmt(audio_convert_ctx, "in_sample_fmt", AV_SAMPLE_FMT_FLPT, 0); av_opt_set_sample_fmt(audio_convert_ctx, "out_sample_fmt", AV_SAMPLE_FMT_S16, 0);

【方法二】：采用swr_alloc_set_opts()：如果第一个参数为NULL则创建一个新的SwrContext，否则对已有的SwrContext进行参数设置。

struct SwrContext *swr_alloc_set_opts(struct SwrContext *s, // ⾳频重采样上下⽂ int64_t out_ch_layout, // 输出的layout, 如：5.1声道 enum AVSampleFormat out_sample_fmt, // 输出的采样格式。 Float, S16,⼀般选⽤是s16 绝⼤部分声卡⽀持 int out_sample_rate, //输出采样率 int64_t in_ch_layout, // 输⼊的layout enum AVSampleFormat in_sample_fmt, // 输⼊的采样格式 int in_sample_rate, // 输⼊的采样率 int log_offset, // ⽇志相关，不⽤管先，直接为0 void *log_ctx // ⽇志相关，不⽤管先，直接为NULL ); 【第三】初始化SwrContext结构体

参数设置好之后必须调用swr_init()对SwrContext进行初始化。

int swr_init(struct SwrContext *s);

如果需要修改转换的参数：

1、重新进行参数设置。 2、再次调用swr_init()。【第四】分配采样内存空间

转换之前需要分配内存空间用于保存重采样的输出数据，内存空间的大小跟通道个数、样本格式需要、容纳的样本个数都有关系。主要有两种方式：【方式一】：输出目标是Frame->data 采用av_frame_get_buffer()函数，举例如下：

AVFrame* pFrame_new; pFrame_new = av_frame_alloc(); pFrame_new->channel_layout = audio_decodec_ctx->channel_layout; pFrame_new->channels = av_get_channel_layout_nb_channels(audio_decodec_ctx->channel_layout); pFrame_new->format = t->inner_sample_format; pFrame_new->sample_rate = t->inner_sample_rate; int64_t nb_samples = av_rescale_rnd(pFrame->nb_samples, pFrame_new->sample_rate, pFrame->sample_rate, AV_ROUND_UP); pFrame_new->nb_samples = nb_samples; //该帧采样点数 av_frame_get_buffer(pFrame_new, 0); pFrame_new->pts = av_rescale_rnd(pFrame->pts, pFrame_new->sample_rate, pFrame->sample_rate, AV_ROUND_UP);

【方式二】：输出目标直接是内存空间 libavutil中的samples处理API提供了一些函数方便管理样本数据，例如av_samples_alloc()函数用于分配存储sample的buffer。

/** * @param[out] audio_data 输出数组，每个元素是指向一个通道的数据的指针。 * @param[out] linesize aligned size for audio buffer(s), may be NULL * @param nb_channels 通道的个数。 * @param nb_samples 每个通道的样本个数。 * @param align buffer size alignment (0 = default, 1 = no alignment) * @return 成功返回大于0的数，错误返回负数。 */ int av_samples_alloc(uint8_t **audio_data, int *linesize, int nb_channels, int nb_samples, enum AVSampleFormat sample_fmt, int align);

也有计算所需内存空间的函数av_samples_get_buffer_size()

/* *获取给定音频参数所需的缓冲区大小。 * @param [out] linesize计算的lineize，可能为NULL * @param nb_channels频道数 * @param nb_samples单个通道中的样本数 * @param sample_fmt样本格式 * @param对齐缓冲区大小对齐（0 =默认，1 =无对齐） * @return需要缓冲区大小，或失败时出现负错误代码 */ int av_samples_get_buffer_size(int *linesize, int nb_channels, int nb_samples, enum AVSampleFormat sample_fmt, int align); 【第五-1】将输⼊的⾳频按照定义的参数进⾏转换并输出 Swr_convert()

【函数定义】

int swr_convert(struct SwrContext *s, // ⾳频重采样的上下⽂ uint8_t **out, // 输出的指针。传递的输出的数组（frame的数据指针） int out_count, //输出缓冲区每通道样本数据数量(对于音频,每个通道数据长度都相同),注意这里不是以字节为单位 const uint8_t **in , //输⼊的数组，AVFrame解码出来的DATA int in_count // 输⼊的单通道的样本数量。这里填入frame->nb_samples即可 );

【函数说明】

0、返回值:转换成功后每个通道的输出样本数，出错则为负值 1、如果没有提供足够的空间用于保存输出数据，采样数据会缓存在swr中。可以通过 swr_get_out_samples()来获取下一次调用swr_convert在给定输入样本数量下输出样本数量的上限，来提供足够的空间。 2、如果是采样频率转换，转换完成后采样数据可能会缓存在swr中，它期待你提供更多的输入数据。 3、如果实际上并不需要更多输入数据，通过调用swr_convert()，其中参数in_count设置为0来获取缓存在swr中的数据。 4、转换结束之后需要冲刷swr_context的缓冲区，通过调用swr_convert()，其中参数in设置为NULL，参数in_count设置为0。注释里面有一句： in and in_count can be set to 0 to flush the last few samples out at the end. 【举例】 ret = swr_convert(audio_convert_ctx, &outBuff, nb_samples, (const uint8_t**)audio_frame->data, audio_frame->nb_samples); while (ret > 0) { // 把最后几个样本冲洗出来 ret = swr_convert(audio_convert_ctx, &outBuff, nb_samples, NULL, 0); } 【第五-2】采用swr_convert_frame() 输入输出帧的转换

【函数定义】

int swr_convert_frame(SwrContext *swr, AVFrame *output, const AVFrame *input);

【参数说明】输入：

swr：重采样上下文结构体 output：输出帧 input：输入帧输出： 0 成功，AVERROR 失败或不匹配的配置。

【函数说明】

转换输入 AVFrame 中的样本，并将其写入输出 AVFrame。输入和输出 AVFrames 必须设置 channel_layout、sample_rate 和 format。如果输出 AVFrame 没有分配数据指针，则将自动调用 av_frame_get_buffer() 来设置 nb_samples 字段以分配帧。输出的 AVFrame 可以为 NULL 或分配的样本少于所需的样本。在这种情况下，任何未写入输出的剩余样本将被添加到内部 FIFO 缓冲区，以在下次调用此函数或 swr_convert() 时返回。如果转换采样率，内部重采样延迟缓冲区中可能有剩余数据。 swr_get_delay() 告诉剩余样本的数量。要将此数据作为输出，请使用 NULL 输入调用此函数或 swr_convert()。如果 SwrContext 配置与输出和输入 AVFrame 设置不匹配，则不会进行转换，取决于哪个 AVFrame 不匹配 AVERROR_OUTPUT_CHANGED、AVERROR_INPUT_CHANGED 或它们的按位或结果。【第六】释放资源

首先释放内存(frame->data可以通过av_frame_free()时释放)：

av_freep(outBuffer);

在所有转换完成之后，释放上下文：

void swr_free(struct SwrContext **s) ⾳频重采样，采样格式转换和混合库点击查看代码 // // 音频帧格式转换 // int32_t AudioConvert( const AVFrame* pInFrame, // 输入音频帧 AVSampleFormat eOutSmplFmt, // 输出音频格式 int32_t nOutChannels, // 输出音频通道数 int32_t nOutSmplRate, // 输出音频采样率 AVFrame** ppOutFrame) // 输出视频帧 { struct SwrContext* pSwrCtx = nullptr; AVFrame* pOutFrame = nullptr; // 创建格式转换器, int64_t nInChnlLayout = av_get_default_channel_layout(pInFrame->channels); int64_t nOutChnlLayout = (nOutChannels == 1) ? AV_CH_LAYOUT_MONO : AV_CH_LAYOUT_STEREO; pSwrCtx = swr_alloc(); if (pSwrCtx == nullptr) { LOGE(" [ERROR] fail to swr_alloc()\n"); return -1; } swr_alloc_set_opts(pSwrCtx, nOutChnlLayout, eOutSmplFmt, nOutSmplRate, nInChnlLayout, (enum AVSampleFormat)(pInFrame->format), pInFrame->sample_rate, 0, nullptr ); // 计算重采样转换后的样本数量,从而分配缓冲区大小 int64_t nCvtBufSamples = av_rescale_rnd(pInFrame->nb_samples, nOutSmplRate, pInFrame->sample_rate, AV_ROUND_UP); // 创建输出音频帧 pOutFrame = av_frame_alloc(); pOutFrame->format = eOutSmplFmt; pOutFrame->nb_samples = (int)nCvtBufSamples; pOutFrame->channel_layout = (uint64_t)nOutChnlLayout; int res = av_frame_get_buffer(pOutFrame, 0); // 分配缓冲区 if (res < 0) { LOGE(" [ERROR] fail to av_frame_get_buffer(), res=%d\n", res); swr_free(&pSwrCtx); av_frame_free(&pOutFrame); return -2; } // 进行重采样转换处理,返回转换后的样本数量 int nCvtedSamples = swr_convert(pSwrCtx, const_cast(pOutFrame->data), (int)nCvtBufSamples, const_cast(pInFrame->data), pInFrame->nb_samples); if (nCvtedSamples nb_samples = nCvtedSamples; pOutFrame->pts = pInFrame->pts; // pts等时间戳沿用 pOutFrame->pkt_pts = pInFrame->pkt_pts; (*ppOutFrame) = pOutFrame; swr_free(&pSwrCtx); // 释放转换器 return 0; } delay @see swr_delay() @see swr_convert() @see swr_get_delay() 补充 channels 和 channel_layout 是啥 channels 为音频的通道数 1 2 3 4 5..... channel_layout 为音频通道格式类型如单通道双通道 .....

相互之间获取

av_get_channel_layout_nb_channels() av_get_default_channel_layout()

对应关系

点击查看代码 channel_layout_map[] { "mono", 1, AV_CH_LAYOUT_MONO }, { "stereo", 2, AV_CH_LAYOUT_STEREO }, { "2.1", 3, AV_CH_LAYOUT_2POINT1 }, { "3.0", 3, AV_CH_LAYOUT_SURROUND }, { "3.0(back)", 3, AV_CH_LAYOUT_2_1 }, { "4.0", 4, AV_CH_LAYOUT_4POINT0 }, { "quad", 4, AV_CH_LAYOUT_QUAD }, { "quad(side)", 4, AV_CH_LAYOUT_2_2 }, { "3.1", 4, AV_CH_LAYOUT_3POINT1 }, { "5.0", 5, AV_CH_LAYOUT_5POINT0_BACK }, { "5.0(side)", 5, AV_CH_LAYOUT_5POINT0 }, { "4.1", 5, AV_CH_LAYOUT_4POINT1 }, { "5.1", 6, AV_CH_LAYOUT_5POINT1_BACK }, { "5.1(side)", 6, AV_CH_LAYOUT_5POINT1 }, { "6.0", 6, AV_CH_LAYOUT_6POINT0 }, { "6.0(front)", 6, AV_CH_LAYOUT_6POINT0_FRONT }, { "hexagonal", 6, AV_CH_LAYOUT_HEXAGONAL }, { "6.1", 7, AV_CH_LAYOUT_6POINT1 }, { "6.1", 7, AV_CH_LAYOUT_6POINT1_BACK }, { "6.1(front)", 7, AV_CH_LAYOUT_6POINT1_FRONT }, { "7.0", 7, AV_CH_LAYOUT_7POINT0 }, { "7.0(front)", 7, AV_CH_LAYOUT_7POINT0_FRONT }, { "7.1", 8, AV_CH_LAYOUT_7POINT1 }, { "7.1(wide)", 8, AV_CH_LAYOUT_7POINT1_WIDE }, { "octagonal", 8, AV_CH_LAYOUT_OCTAGONAL }, { "downmix", 2, AV_CH_LAYOUT_STEREO_DOWNMIX, }, 参考

FFmpeg之重采样总结音频属性相关：声道、采样率、采样位数、样本格式、比特率 FFMPEG开发快速入坑——音频转换处理 channel_layout

【本文地址】

FFmpeg学习：音频重采样

FFmpeg学习：音频重采样

今日新闻

推荐新闻